Group 20
Aliya Mosha (s2206786)
Saioa Galvin (s2516907)
Hugh Graham (s2318382)
Contents
In this report we investigate the statistical techniques used by the forensic services at Police Scotland when calculating blood alcohol concentrations in criminal cases associated with drink driving. In UK law, it is an offence to drive with blood alcohol concentration (BAC) above the prescribed limit (0.47g/kg). Since alcohol levels fall over time, the measured BAC at the police station is lower than at the time of driving; forensic scientists ‘back-calculate’ the earlier level using an elimination rate (Forensic Science Regulator 2020).
The current method implemented by Police Scotland incorporates the \(\beta\) elimination rate, a measure of how quickly alcohol is eliminated from the bloodstream. A fixed range for \(\beta\) is applied across all individuals, using the 2.5th percentile of the distribution as a conservative cutoff. If the back‑calculated BAC exceeds the legal limit under this assumption, the individual is reported as over the limit. While simple, this binary approach does not reflect the variability of \(\beta\) across individuals or the uncertainty inherent in this calculation.
Using a dataset of 100 individuals (Armbrecht 2008), we examined how \(\beta\) varies based on individual characteristics (weight, height, age, sex). We found that there is significant negative correlations between \(\beta\) and both weight and height, a clear sex-specific difference, with males typically showing higher elimination rates. Further, we found age has little influence on \(\beta\).
We then applied these findings by comparing several modelling strategies:
Parametric Distributions: A Gamma distribution provided the most accurate fit to the observed \(\beta\) values.
Linear Regression: Useful approach for incorporating predictors, though a violation of assumptions was apparent.
Quantile Regression: A more appropriate approach when estimating the tails of \(\beta\), as used in back-calculations.
We propose replacing the current binary cutoff with a probabilistic and interval-based framework: providing the probability that the BAC at the time of driving exceeded the legal limit, along with a 95% prediction interval for the back-calculated BAC.
As an example, we found that under the current approach an individual would be deemed under the legal limit, whereas our suggested approaches provide probabilities of around 75% that they were indeed over the limit. This showed a substantial likelihood that the individual was above the legal threshold at the time of driving.
We also considered situations where no blood sample is available. In this case, we have to rely on eyewitness dose of alcohol, weight of the suspect and the so-called volume of distribution of alcohol in the blood, \(V_d\). The standard approach treats \(\beta\) and \(V_d\) as independent and applies fixed percentile ranges for both.
We found \(V_d\) and \(\beta\) to be negatively correlated, meaning treating them as independent and combining extreme marginal percentiles can lead to implausible scenarios. Hence, we suggest reports should acknowledge this relationship and avoid the assumption of independence.
In conclusion, our findings support a shift towards a probabilistic reporting framework in forensic blood alcohol analysis. We recommend expert witness reports should include:
Measured BAC and elapsed time since driving.
The back-calculated BAC at the time of driving with a 95% uncertainty interval.
The probability of exceeding the legal limit.
Clear disclosure of assumptions about \(\beta\) and \(V_d\) when applicable, including their correlation.
This approach ultimately enhances transparency, communicates uncertainty, and provides courts with a more scientifically defensible basis for decision making.
The \(\beta\) elimination rate (measured in g/kg/h) measures how quickly alcohol is eliminated from the bloodstream, and is used by forensic scientists at Police Scotland when ‘back-calculating’ the BAC at the time of offence (Forensic Science Regulator 2020). This calculation is of form
\[C_0 = C_t +\beta t,\]
where \(C_t\) is the measured BAC at time \(t\) hours after the alleged offence, and \(C_0\) is the estimated BAC at the time of driving.
Currently, forensic scientists use a single, fixed range of \(\beta\) values for all individuals. This approach simplifies practical implementation but does not account for variability due to characteristics of the individual. In particular, factors such as age, gender, weight, or height could influence the rate at which alcohol is eliminated from the bloodstream.
Hence, it is worth exploring whether \(\beta\) can be more effectively modelled to reflect the variability in individual’s characteristics, rather than relying on a fixed population range.
The dataset analysed in this study was taken from a PhD thesis, and contains measurements associated with BAC (Armbrecht 2008). In particular, the dataset contains 100 observations with characteristic information about the individual, and details about the dose of alcohol consumed by the individual.
Key variables that are explored in this report include:
Beta60: the negative of the \(\beta\) elimination rate (g/kg/h), though
for our analysis we took the corresponding positive variable
beta;
C0: the concentration of alcohol in the blood at
time zero;
Weight: the weight (in kg) of a given
individual;
Height: the height (in cm) of a given
individual;
Age: the age (in years) of a given
individual;
Sex: the sex of a given individual;
Amount of Alcohol Consumed: the dose (in g) of
alcohol consumed.
In order to understand and evaluate the performance of the \(\beta\) elimination rate, we first examine various plots to visualise how the rate is influenced by certain characteristics. The characteristics of interest are weight, height, age, and sex.
These are each plotted against the \(\beta\) elimination rate. Intuitively a larger \(\beta\) value corresponds to faster elimination of alcohol from the blood stream. Figures 1-4 below explore these relationships.
From Figures 1-4, we can deduce the following:
as weight increases the elimination rate decreases linearly;
similarly, as height increases the elimination rate decreases linearly;
the majority of age data is scattered between 20-30, however there is a reasonably constant spread of values, suggesting age is less influential on alcohol elimination;
there is a clear difference in elimination rate between sexes, with males typically obtaining higher values.
To further enhance our investigation into the effect each of these characteristics has on the elimination rate, Table 1 evaluates the significance of correlation between each characteristic and \(\beta\).
| Characteristic | Correlation | P_value |
|---|---|---|
| Age | 0.041 | 0.68772 |
| Height | -0.410 | 0.00002 |
| Weight | -0.356 | 0.00027 |
From this output, we can see that weight and height have significant correlations with the \(\beta\) elimination rate, with respective correlations of -0.356 and -0.410. This suggests that these characteristics play a crucial role in determining the rate at which alcohol is eliminated from the blood stream. Further, the correlation test returned a p-value below 0.05 for both of these characteristics, indicating that there is evidence to reject the null hypothesis that there is no correlation between these characteristics and \(\beta\). As expected, there is a low correlation between age and elimination rate, and a large p-value of 0.68772 further validates this. Hence, we can deduce that there is little correlation between age and blood alcohol elimination rate.
Due to the clear relationships between \(\beta\) and characteristics of the tested individuals, it is worth considering approaches in which these factors are incorporated to accurately predict the rate at which alcohol is eliminated from the blood stream.
A key issue with creating a distribution from the \(\beta\) elimination rate values is that this distribution is based on data from 100 individuals. Therefore there is a risk that it fails to capture the general trend of the whole population of Scotland. We propose instead investigating if values of \(\beta\) can be taken from a standard distribution. First, we will examined the general spread of \(\beta\) values given in the Figure 5, where the mean, 2.5th, and 97.5th quantiles are indicated in red.
We have seen from exploratory data analysis that alcohol elimination rate depends on a range of variables. Our initial analysis showed that sex is a significant factor, so we can split the data to see how \(\beta\) is distributed for males and females. Figure 6 shows the histogram and density of the data when categorised by sex.
Figures 5 and 6 can point us towards a few different distributions. We can analyse how well the data fits these distributions in the residuals.
We can use Figures 5 and 6 to see that the shape is similar to normal, beta and gamma distributions. Our method selection also relies on simplicity, as we do not want to over-fit our chosen distribution on such a small dataset. To evaluate how well our data fits our chosen distributions, we have analysed the residual (\(\epsilon_i\)) graphs in Figure 7.
Empirical and Theoretical Densities: The theoretical peak value is skewed compared to the empirical peak, and has lower density.
Q-Q plot: Lack-of-fit observed at distribution tails. The left tail is heavy while the right tail is light.
Empirical and Theoretical CDFs: Poor fit at central values of the distribution.
P-P plot: Lack-of-fit at the distribution centre, as the central values deviate from the linear relationship.
Empirical and Theoretical Densities: Improved fit of peak is observed, while still not completely aligned.
Q-Q plot: Points are randomly distributed above and below the general linear trend, with outliers at the lower tail.
Empirical and Theoretical CDFs: Improved fit at distribution centre compared to normal.
P-P plot: Improved fit at distribution centre compared to normal.
Empirical and Theoretical Densities: Best fit of peak of the analysed distributions.
Q-Q plot: Points are randomly distributed above and below the general linear trend across all quantiles.
Empirical and Theoretical CDFs: Best fit of theoretical distribution to data.
P-P plot: Majority of points lie on fitted line.
After looking at the residuals, we conclude that Gamma is the most appropriate distribution for \(\beta\). The fitted distribution is estimated as \(\beta \sim \text{Gamma}(\alpha = 31.95, \lambda = 173.71)\).
The density graph of the Gamma distribution across the whole population is shown in Figure 8 to demonstrate fit, again with the mean, 2.5th, and 97.5th quantiles indicated in red.
Considering that sex is a factor variable, we can approximate the distributions of \(\beta\) dependent on whether the suspect is male of female.
We have fitted two sex-specific distributions to the data, \(\beta_{male} \sim \text{Gamma}(\alpha = 36.16, \lambda = 207.83)\) and \(\beta_{female} \sim \text{Gamma}(\alpha = 40.00, \lambda = 199.77)\). This is an improvement of the Gamma fit on the whole population, as we are able to more accurately predict the confidence interval of alcohol elimination rate of our suspect. For Police Scotland, this is vital as we are using as much information as we have at our disposal to make a conclusion in the case.
While fitting a distribution by sex has allowed us to incorporate more information on the suspect, without overfitting to a small dataset, sex is not the only characteristic given to the police. By creating a linear model, we can create a distribution for \(\beta\) given a range of information about the suspect.
From the exploratory data analysis, we can deduce that weight and height cause a linear downward trend in \(\beta\), however, we see that age has little impact. When creating our model we are focusing on balancing accuracy and simplicity. Therefore we can create the following model, where \(\alpha_j\) represent the coefficients for \(j \in \{0, 1, 2, 3\}\) to model \(\beta_i\), where \(i \in \{1, \dots, 100\}\):\[ \begin{aligned} \text{linear model}: \quad E[\beta]_i &= \alpha_0 + \alpha_1\text{weight}_i + \alpha_2\text{height}_i + \alpha_3\text{sex}_i + \epsilon_i, \\ \epsilon_i &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]
To analyse the accuracy of fit of this model, we can examine the residual (\(\epsilon_i\)) plots, as shown in Figure 10.
Residual vs Fitted: There is a change of variance with mean, so the constant variance assumption is slightly violated. Unfortunately there is little data points to establish if this is an important violation.
Q-Q Residuals: The ordered standardised residuals are plotted against quantiles of a standard normal. Lack of fit observed at distribution tails. The left tail is heavy while the right tail is light.
Scale-Location: We have random variation around the mean, so there is no violation of the constant variation assumption.
Residuals vs Leverage: This plot gives us Cook’s distance, which measures the change in all model fitted values on omission of the data point in question. From the plot we can see there are no outliers, so there aren’t any highly influential points in our data.
To try to resolve these violations of the assumptions on residuals, we can create an alternative linear model. Given that there is a quadratic shape in the Residuals vs Fitted graph, we create the following linear model, where \(\alpha_j\) represent the new coefficients for \(j \in \{0, 1, 2, 3\}\) to model \(\beta_i\), the \(i = 100\) datapoints of blood alcohol elimination rate \(\beta\):
\[ \begin{aligned} \text{square root linear model}: \quad E[\sqrt{\beta}]_i &= \alpha_0 + \alpha_1\text{weight}_i + \alpha_2\text{height}_i + \alpha_3\text{sex}_i + \epsilon_i, \\ \epsilon_i &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]
We get the residual graphs in Figure 11.
The Q-Q Residuals plot is improved, however there is only marginal change in the Residuals vs Fitted plot. Therefore we explore alternative methods.
The biggest issue with the linear model above is that it’s aim is to find the conditional mean function, \(E[\beta | x]\), where \(x\) represents the explanatory variables of the model (weight, height and sex). The aim of our analysis is to find a threshold of \(\beta\) that can be used to determine if a suspect had high blood alcohol levels while driving. Therefore we want to investigate further than the expectation, \(E[\beta | x]\), and examine the spread of the whole conditional distribution, \(\beta | x\). When investigating quantiles of linear models, it is more accurate to use quantile regression (Rodriguez and Yao 2017). As the square root made little difference, we adapt our quantile regression from the initial linear model above.
Therefore we are finding the quantiles (\(\tau\)) of \(\beta\) in thefollowing model (Katchova 2013), where \(\alpha_{j,\tau}\) represent the coefficients associated with the \(\tau^{\text{th}}\) quantile for \(j \in \{0, 1, 2, 3\}\) to estimate \(\beta_i\), where \(i \in \{1, \dots, 100\}\):\[ \begin{aligned} \text{quantile regression model}: \quad \beta_{i,\tau} &= \alpha_{0,\tau} + \alpha_{1,\tau}\text{weight}_i + \alpha_{2,\tau}\text{height}_i + \alpha_{3,\tau}\text{sex}_i + \epsilon_{i,\tau}, \\ \epsilon_{i,\tau} &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]To further demonstrate the difference between the linear and quantile regression, consider how we would find the 2.5% quantile of \(\beta\) in both methods. For the linear model method, we have to find \(E[\beta | x] + 2.5\text{% quantile of } \epsilon_i\). This assumes that the residuals are equally distributed across the whole population of \(\beta\), which could be inaccurate. On the other hand, in quantile regression, we simply find the 2.5% quantile of the distribution of \(\beta | x\). Not only does this not assume that residuals are equally distributed, but it is an intuitive approach, more aligned with the current methodology of Police Scotland, while improving the accuracy of results.
The subtle differences between the methods produce different final results, as shown in Figure 12.
Limitations of the Current Approach
The current forensic practice uses a single threshold - the 2.5th percentile of the distribution of \(\beta\) - to make binary decisions about whether an individual exceeded the legal limit.
Currently, if the back-calculated \(C_0\) using the 2.5th percentile of \(\beta\) exceeds the legal limit, the individual is reported as being over the limit. While this approach is straightforward to implement, it has several significant drawbacks:
It treats uncertainty as certainty: Using a single value for \(\beta\) implies precision that does not exist. Our analysis shows that \(\beta\) varies considerably between individuals.
It ignores individual characteristics: The current method applies the same \(\beta\) value to everyone, despite clear evidence that elimination rates differ based on weight, height, and sex.
It provides no measure of confidence: A binary “over/under” verdict gives no indication of how confident we should be in that conclusion. An individual barely above the threshold is treated identically to one far exceeding it.
It may lead to inconsistent outcomes: Two individuals with identical measured BAC values but different characteristics may have had very different BAC levels at the time of driving, yet receive the same assessment.
Proposed Framework: Probabilistic Reporting
We recommend that Police Scotland adopt a reporting framework that explicitly acknowledges and quantifies uncertainty. Rather than providing a single yes/no answer, reports should include:
This approach transparently communicates both what we know and what remains uncertain, allowing legal decision-makers to weigh the evidence appropriately. We evaluated three approaches for generating these probability statements.
By fitting a Gamma distribution to the observed elimination rates across the entire dataset, we obtain a smooth representation of how \(\beta\) varies in the data. The fitted distribution is \(\beta \sim \text{Gamma}(\alpha = 31.95, \lambda = 173.71)\).
The 95% prediction interval for \(C_0\) is constructed using the 2.5th and 97.5th percentiles of the Gamma distribution:\[ C_0 \in [C_t + \beta_{0.025} \times t,\ C_t + \beta_{0.975} \times t]. \]Using the fitted distribution, we can also calculate the probability that the individual exceeded the legal limit at the time of driving,
\[ \begin{eqnarray} \mathbb{P}(C_0 > x) &=& 1 - F_{\beta}\left(\frac{x - C_t}{t}\right), \end{eqnarray}\]where \(x\) is the legal limit (0.47 g/kg) and \(F_{\beta}\) is the cumulative distribution function of the fitted Gamma distribution for \(\beta\).
Advantages: Simple to implement; requires only the measured BAC and elapsed time; provides conservative estimates that account for population-wide variability.
Limitations: Does not incorporate individual characteristics. For example, it treats a 20-year-old male and a 70-year-old female identically.
Our analysis revealed clear differences in elimination rates between males and females. We can fit separate Gamma distributions for each sex:
This provides more tailored estimates while maintaining simplicity.
Similar to above, we can find the probability that the individual exceeded the legal limit at the time of driving:
where \(x\) is the legal limit (0.47 g/kg) and \(F_{\beta_{\text{male}}}, F_{\beta_{\text{female}}}\) are the cumulative distribution functions of the fitted sex-specific Gamma distributions for \(\beta\).
Advantages: Accounts for the most significant individual characteristic; still straightforward to implement.
Limitations: Ignores other relevant factors such as weight and height, and there is more danger of overfitting the sex-specific distributions as there are fewer values once the dataset is subdivided.
Quantile regression allows us to model how the lower and upper bounds of the elimination rate vary with individual characteristics. Rather than predicting the average \(\beta\) for someone with given characteristics, this method directly estimates the 2.5th and 97.5th percentiles of their elimination rate.
This model finds the \((\tau\) x \(100)\)% quantile for \(\beta\):\[ \begin{aligned} \text{quantile regression model}: \quad \beta_{i,\tau} &= \alpha_{0,\tau} + \alpha_{1,\tau}\text{weight}_i + \alpha_{2,\tau}\text{height}_i + \alpha_{3,\tau}\text{sex}_i + \epsilon_{i, \tau}, \\ \epsilon_{i,\tau} &\overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2). \end{aligned} \]For a given individual \(p\), we obtain personalized bounds: \(\beta_{p,0.025}\) and \(\beta_{p,0.975}\). To find the probability that the individual is above the limit, we use a Monte Carlo approximation. Based on our exploratory data analysis, we deduce that the majority of \(\beta\) values lie around a central value, so we choose a triangular distribution. We simulate 10000 samples from \(\beta \sim \text{Triangular}(\beta_{p,0.025}, \beta_{p,0.5}, \beta_{p,0.975})\). Then, we use the equation \(C_0 = C_t + \beta t\). Then we find the mean number of times that our simulated \(C_0\) is greater than the legal limit:\[\begin{eqnarray} \mathbb{P}(C_0 > x) &=& \mathbb{P}(\text{# simulated samples of C}_0 \text{ s.t. C}_0 > x). \end{eqnarray}\]
Advantages: Fully accounts for individual variability; provides the most accurate prediction intervals; does not assume constant variability across the population.
Limitations: Requires more information about the individual; slightly more complex to implement.
Benefits for the Justice System
This probabilistic framework offers several advantages:
To illustrate these methods, consider a 70-year-old female (weight: 70 kg, height: 160 cm) who provides a blood sample 2 hours after driving, with a measured BAC of \(C_t\) = 0.15 g/kg. The legal limit is 0.47 g/kg. The police wish to charge her with the criminal offence of driving while over the legal limit.
1. Current Approach
Using \(\beta\) = 0.126 g/kg/h (the fixed 2.5th percentile):
\[ C_0 = 0.15 + 0.126 \times 2 = 0.402 \ g/kg \]Verdict: Below the legal limit.
2. Proposed Approach Using Whole Population Gamma Distribution
The 95% prediction interval for \(C_0\) is (0.40, 0.66) g/kg, and the probability that \(C_0\) exceeded the legal limit is 76%.
Verdict: While the lower bound of the prediction interval (0.40 g/kg) is below the legal limit, there is a substantial probability (76%) that the individual’s true BAC at the time of driving exceeded 0.47 g/kg. The current method would have concluded she was below the limit, but our analysis reveals considerable uncertainty in that conclusion.
3. Proposed Approach Using Sex-Specific Gamma Distribution
The 95% prediction interval for \(C_0\) is (0.44, 0.68) g/kh, and the probability that \(C_0\) exceeded the legal limit is 91%.
Verdict: The lower bound of this prediction interval is very close to the legal limit, and the prediction interval found shows that there is significant evidence to indicate that the individual’s true BAC at the time of driving exceeded 0.47 g/kg, with a high probability of 91%. This is a serious contrast to the current approach.
4. Proposed Approach Using Quantile Regression
We fit the model on the original dataset to find the 2.5% quantile, 50% quantile (median) and 97.5% quantile of \(\beta | x\). Then we can predict given the suspect’s specific characteristics (female, 70 kg, 160 cm, 70 years old), to find the 95% prediction interval is (0.43, 0.69) g/kg. The probability that \(C_0\) exceeded the legal limit is 94%.
Verdict: This method takes into account all the significant information about the suspect, and makes a measured conclusion with justified uncertainty. Under this method, there is 94% chance that the individual’s true BAC exceeded the legal limit.
Comparison of Methods
The comparison of all three methods is shown in Table 4 and Figure 13, which clearly demonstrate how the current approach provides a single point estimate that masks substantial uncertainty. Table 4 shows the results of applying our different approaches to the case described above.
| Approach | Lower (2.5%) | Median (50%) | Upper (97.5%) |
|---|---|---|---|
| Empirical Quantiles | 0.403 | 0.509 | 0.654 |
| Gamma Fitted | 0.402 | 0.514 | 0.656 |
| Female Gamma Fitted | 0.436 | 0.547 | 0.684 |
| Quantile Regression | 0.429 | 0.539 | 0.693 |
Figure 13 is a visual representation of the quantiles found above, also showing the probability of exceeding the legal limit.
Recommended Report Format
We propose that forensic reports include a standardised summary table that presents all relevant information clearly and concisely. Given our detailed analysis above, we conclude that quantile regression is the most accurate method for predicting the true BAC of an individual. Figure ? gives a summary of the most important information for use by Police Scotland.
| Quantity | Value |
|---|---|
| Measured BAC (Ct) | 0.15 g/kg |
| Time since driving (t) | 2 hours |
| Legal limit | 0.47 g/kg |
| Estimated C₀ (central) | 0.54 g/kg |
| 95% uncertainty interval | (0.43, 0.69) g/kg |
| Probability C₀ exceeded legal limit | 94% |
If it is too late for a blood or breath test to predict accurate results for \(C_0\), but there is eyewitness evidence measuring the quantity of alcohol consumed, Widmark’s equation can help to calculate the blood alcohol concentration \(t\) hours after drinking (\(C_t\)). The validity of this estimate is dependent on the reliability of the eyewitness’ testimony, however this analysis can be combined with other evidence to support a case. Widmark’s equation can be written as
\[ C_t = \frac{A}{\text{Weight} \times V_d} - \beta t, \]
where \(A\) is the observed does of alcohol consumed, \(\text{Weight}\) is the weight of the individual, and \(V_d\) is a parameter known as the volume of distribution, “a proportionality constant that relates the total amount of drug (in our case alcohol) in the body to the plasma concentration of the drug at a given time.” (Mansoor and Mahabadi 2023) \(V_d\) is measured with the units L/kg. Rearranging this equation we find
\[ C_0 = C_t +\beta_t = \frac{A}{\text{Weight} \times V_d}, \]
which in turn implies
\[ V_d = \frac{A}{\text{Weight} \times C_0}. \]
Hence, given we are provided the data for \(A\), \(\text{Weight}\), and \(C_0\), we can calculate \(V_d\) using the above equation.
Similar to our approach in analysing the effectiveness of the \(\beta\) elimination rate, we can investigate the relationships between \(V_d\) and characteristics of individuals (weight, height, age, and sex). It is also worth investigating the distribution of \(V_d\) values across our data.
Figures 14-18 help us visualise these relationships.
From Figure 5, we can see that the majority of \(V_d\) values lie around 0.65, in a roughly normally distributed pattern, with the mean, 2.5th, and 97.5th quantiles highlighted in red. There appears to be one particular outlier, with a \(V_d\) value of around 1.2, which may be worth investigating further.
Figures 6-9 highlight the following relationships:
\(V_d\) shows a very slight negative trend as weight increases;
\(V_d\) appears constant across the range of heights;
\(V_d\) appears to decrease with age, although again the majority of the data lies between ages of 20-30, so general conclusions are hard to draw from the sample;
\(V_d\) appears constant across both sexes.
The above findings suggest that \(V_d\) seems to differ little across various characteristics. This suggests \(V_d\) is a more generalised approach compared to the \(\beta\) elimination rate.
It is also worth considering how the two rates are correlated, as independence is assumed between \(\beta\) and \(V_d\). Performing a correlation test between the two can help to deduce whether this assumption is upheld. Further, Figure 19 helps to visualise the relationship between \(\beta\) and \(V_d\).
From the correlation test, we find that there is an observed correlation of -0.404, and the null hypothesis that the true correlation is equal to zero, i.e. the two measures are independent, can be confidently rejected as there is a corresponding p-value of 3.122x10⁻⁵.
Figure 19 further highlights this clear relationship between the two measures, and hence the independence assumption appears to be violated.
The current method offers several key strengths that make it well-suited for forensic applications. Its straightforward approach is both easy to implement and readily explained in court proceedings, which is essential for ensuring that evidence is understood by legal professionals and juries. By using the 97.5th percentile, the method takes a conservative stance that provides defendants with the benefit of the doubt, aligning with fundamental principles of justice. The standardised nature of this approach ensures consistency across all cases, preventing arbitrary variations in how evidence is evaluated. Furthermore, the methodology’s transparency and reproducibility give it scientific rigour, allowing independent verification of results and supporting its credibility in legal contexts.
The current forensic method assumes independence between \(\beta\) and \(V_d\) when calculating the range for \(C_t\). However, our exploratory data analysis shows evidence of a significant correlation. This finding directly contradicts the independence assumption underlying the current approach.
Although the official forensic procedure specifies using the 97.5th percentile for both \(\beta\) and \(V_d\), this convention is based on policy rather than mathematical reasoning. Forensic laboratories apply the same “upper percentile rule” across all parameters to maintain consistency and to err on the side of caution, as using higher parameter values for \(\beta\) and \(V_d\) generally produces lower blood alcohol estimates. Mathematically, Widmark’s equation shows that \(C_t\) decreases as \(\beta\) and \(V_d\) both increase, thus the maximum possible value of \(C_t\) would occur for low \(\beta\) and low \(V_d\) values (their 2.5th percentiles). Regardless of which tail is used, the critical statistical issue with the current method is that it assumes that \(\beta\) and \(V_d\) are independent, when our analysis shows that \(\beta\) and \(V_d\) have a significant negative correlation.
Assuming independence allows forensic analysts to combine their most extreme values (e.g. highest \(\beta\) with highest \(V_d\)) even though such an extreme combination almost never occurs in practice. This may produce inaccurate \(C_t\) estimates, exaggerating how high a person’s blood alcohol concentration might have been at the time of interest or significantly underestimating. In statistical terms, the current approach treats the joint distribution of \(\beta\) and \(V_d\) as the product of their marginal distributions, when in reality the joint distribution should account for their negative dependence structure.
The joint distribution in Figure 20 illustrates how \(\beta\) and \(V_d\) co-vary in practice. The dashed blue lines mark the marginal 97.5th percentiles used under the current forensic procedure, but the plot shows that no observations lie in the upper-right region where both parameters simultaneously take these extreme values. In other words, the combination of a very high elimination rate and a very high volume of distribution - treated as a realistic possibility when assuming independence - does not occur anywhere in the data. The density contours further emphasise that the highest joint probability mass lies along a negatively sloped region, reflecting the negative correlation identified earlier. Thus, the current method constructs “best-case” scenarios using parameter pairs that are not supported by the empirical joint behaviour of the data.
The empirical joint distribution of \(C_t\) provides a more realistic representation of uncertainty by preserving the dependence structure between \(\beta\) and \(V_d\). As an example, for an individual of 70 kg consuming the mean recorded alcohol dose, the resulting 95% empirical range for \(C_t\) is (0.659, 1.455). By contrast, the value obtained under the independence assumption (0.497) - using both 97.5th percentiles simultaneously - falls outside this plausible interval, indicating that the current approach can substantially misrepresent the credible bounds of \(C_t\). This suggests that the independence-based method may be overly conservative in its construction of back-calculations, in some cases underestimating blood alcohol levels at the time of interest.
| Method | Lower_2.5 | Median | Upper_97.5 |
|---|---|---|---|
| Empirical joint (β,Vd) | 0.659 | 1.015 | 1.455 |
| Independent 97.5th percentiles | 0.497 |
Taken together, these results reinforce the overarching conclusion of our findings: forensic estimates should be grounded in distributions that reflect real variability and dependence between physiological parameters, and uncertainty should be communicated explicitly rather than embedded in fixed, extreme percentile choices.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(afex))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(kableExtra))
suppressPackageStartupMessages(library(cv))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(fitdistrplus))
suppressPackageStartupMessages(library(readxl))
suppressPackageStartupMessages(library(quantreg))
suppressPackageStartupMessages(library(purrr))
suppressPackageStartupMessages(library(broom))
#%%%%%%%%%%%%%%%%%%%%%%%%%% DATA WRANGLING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
data <- read_excel("SCS_BAC_and_BrAC_split_TOP.xlsx")
data$sex <- as.factor(data$Sex)
data$beta60 <- data$`Beta60 (g/kg/h)`
data$weight <- data$`Weight (kg)`
data$height <- data$`Height (cm)`
data$age <- data$`Age (years)`
data$beta <- -data$beta60
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%% BETA EDA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# beta vs weight
weight_plot <- ggplot(data, aes(x=weight, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 1: β vs Weight", x="Weight (kg)",
y="β Elimination Rate (g/kg/h)")
# beta vs height
height_plot <- ggplot(data, aes(x=height, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 2: β vs Height", x="Height (cm)",
y="β Elimination Rate (g/kg/h)")
# beta vs age
age_plot <- ggplot(data, aes(x=age, y=beta, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method="lm", color="navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title="Figure 3: β vs Age", x="Age (years)",
y="β Elimination Rate (g/kg/h)")
# beta vs gender
sex_plot <- ggplot(data, aes(x = sex, y = beta60,
fill = sex)) +
geom_violin(alpha = 0.8) +
geom_boxplot(width = 0.2, color = "navy", alpha = 0.7) +
scale_fill_manual(values = c("seagreen", "skyblue")) +
labs(
title = "Figure 4: β vs Sex",
x = "Sex",
y = "β Elimination Rate (g/kg/h)"
) +
guides(fill = "none")
# Compute correlation and p-value between beta and each characteristic.
corr_table <- data %>%
dplyr::select(beta, weight, age, height) %>%
pivot_longer(cols = c(weight, age, height),
names_to = "Characteristic", values_to = "Value") %>%
group_by(Characteristic) %>%
summarise(
Correlation = cor(beta, Value, use = "pairwise.complete.obs"),
P_value = cor.test(beta, Value)$p.value
) %>%
mutate(
Characteristic = str_to_title(Characteristic),
Correlation = round(Correlation, 3),
P_value = signif(P_value)
)
#%%%%%%%%%%%%%%%%%%%% BETA DISTRIBUTION TESTING %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
density_plot <- ggplot(data, aes(x = beta)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.005,
fill = "steelblue",
alpha = 0.55) +
geom_density(color = "navy",
size = 3) +
geom_vline(aes(xintercept=mean(beta)),
linetype = "dashed",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(beta, 0.025)),
linetype = "dotted",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(beta, 0.975)),
linetype = "dotted",
color = "red",
size = 2) +
labs(x = expression(""*beta), y = "Density",
title = expression("Figure 5: Distribution of "*beta))
# define colours
sex_cols <- c("male" = "skyblue", "female" = "seagreen")
# compute group statistics
stats <- data %>%
group_by(sex) %>%
summarise(
mean = mean(beta),
q025 = quantile(beta, 0.025),
q975 = quantile(beta, 0.975)
)
# main plot
sex_density_plot <- ggplot(data, aes(x = beta, colour = sex)) +
# density curves
geom_density(size = 3) +
# mean lines
geom_vline(data = stats,
aes(xintercept = mean, colour = sex, linetype = "Mean"),
size = 2.2) +
# lower quantile
geom_vline(data = stats,
aes(xintercept = q025, colour = sex, linetype = "2.5th quantile"),
size = 2) +
# upper quantile
geom_vline(data = stats,
aes(xintercept = q975, colour = sex, linetype = "97.5th quantile"),
size = 2) +
# unified colour control
scale_colour_manual(values = sex_cols, name = "Sex") +
scale_fill_manual(values = sex_cols, name = "Sex") +
# consistent linetypes
scale_linetype_manual(
name = "Statistic",
values = c("Mean" = "dashed",
"2.5th quantile" = "dotted",
"97.5th quantile" = "dotted")
) +
labs(
x = expression(beta),
y = "Density",
title = expression("Figure 6: Distribution of " * beta * " by Sex")
)
# Fit distributions
normal_fit <- fitdist(data$beta, distr = "norm", method = "mle")
beta_fit <- fitdist(data$beta, distr = "beta", method = "mle")
gamma_fit <- fitdist(data$beta, distr = "gamma", method = "mle")
plot.legend <- c("Normal", "Beta", "Gamma")
# Layout
par(mfrow = c(2, 2), mar = c(5, 5, 3, 2), oma = c(0, 0, 5, 0), cex = 2)
# Comparing fit of different distribution approaches.
# --- DENSITY (custom lines) ---
cols <- c("red", "green", "blue")
ltys <- c(1, 2, 3)
lwds <- rep(5, 5)
denscomp(
list(normal_fit, beta_fit, gamma_fit),
legendtext = plot.legend,
fitcol = cols,
fitlty = ltys,
fitlwd = lwds,
xlab = "beta",
ylab = "Density",
main = "Density Comparison",
xlegend = "topright"
)
# --- QQ PLOT (use defaults) ---
qqcomp(
list(normal_fit, beta_fit, gamma_fit),
legendtext = plot.legend,
fitcol = cols,
main = "QQ Plot Comparison",
xlegend = "bottomright"
)
# --- CDF PLOT (defaults) ---
cdfcomp(
list(normal_fit, beta_fit, gamma_fit),
legendtext = plot.legend,
fitcol = cols,
fitlty = ltys,
fitlwd = lwds,
main = "CDF Comparison",
xlegend = "bottomright"
)
# --- PP PLOT (defaults) ---
ppcomp(
list(normal_fit, beta_fit, gamma_fit),
legendtext = plot.legend,
fitcol = cols,
main = "PP Plot Comparison",
xlegend = "bottomright"
)
# Gamma approach
# Extract fitted parameters
gamma_shape <- gamma_fit$estimate[[1]]
gamma_rate <- gamma_fit$estimate[[2]]
# Build smooth gamma curve over the observed range
gamma_curve <- data.frame(
x = seq(min(data$beta), max(data$beta), length.out = 1000)
) %>%
mutate(density = dgamma(x, shape = gamma_shape, rate = gamma_rate))
# Gamma quantiles
stats_gamma <- data.frame(
Quantile = c("2.5%", "50%", "97.5%"),
value = qgamma(c(0.025, 0.50, 0.975), gamma_shape, gamma_rate)
)
# Plot
gamma_plot <- ggplot(data, aes(x = beta)) +
# Empirical histogram
geom_histogram(
aes(y = after_stat(density)),
binwidth = 0.005,
fill = "steelblue", alpha = 0.5
) +
# Fitted gamma density
geom_line(
data = gamma_curve,
aes(x = x, y = density),
colour = "navy",
linewidth = 3
) +
# Gamma quantile lines
geom_vline(
data = stats_gamma,
aes(xintercept = value, linetype = Quantile),
colour = "red",
linewidth = 1.2
) +
scale_linetype_manual(
values = c("2.5%" = "dotted", "50%" = "dashed", "97.5%" = "dotted")
) +
labs(
x = expression(beta),
y = "Density",
title = expression("Figure 8: Empirical Distribution of " * beta *
" with Fitted Gamma Model"),
linetype = "Gamma Quantile"
)
# Gamma by Sex
# Fit gamma models for each sex
m_data <- data %>% filter(Sex == "male")
m_gamma_fit <- fitdist(m_data$beta, "gamma", method = "mle")
m_gamma_shape <- m_gamma_fit$estimate[[1]]
m_gamma_rate <- m_gamma_fit$estimate[[2]]
m_quantiles <- qgamma(c(0.025, 0.5, 0.975), m_gamma_shape, m_gamma_rate)
f_data <- data %>% filter(Sex == "female")
f_gamma_fit <- fitdist(f_data$beta, "gamma", method = "mle")
f_gamma_shape <- f_gamma_fit$estimate[[1]]
f_gamma_rate <- f_gamma_fit$estimate[[2]]
f_quantiles <- qgamma(c(0.025, 0.5, 0.975), f_gamma_shape, f_gamma_rate)
# build density curves
m_curve <- tibble(
x = seq(min(m_data$beta), max(m_data$beta), length.out = 1000),
density = dgamma(x, m_gamma_shape, m_gamma_rate)
)
f_curve <- tibble(
x = seq(min(f_data$beta), max(f_data$beta), length.out = 1000),
density = dgamma(x, f_gamma_shape, f_gamma_rate)
)
# Quantile df for geom_vline
m_stats <- tibble(
Quantile = c("2.5%", "50%", "97.5%"),
value = m_quantiles
)
f_stats <- tibble(
Quantile = c("2.5%", "50%", "97.5%"),
value = f_quantiles
)
# Male plot
male_plot <- ggplot(m_data, aes(beta)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.005,
fill = "skyblue",
alpha = 0.4) +
geom_line(data = m_curve,
aes(x = x, y = density),
colour = "skyblue4",
linewidth = 3) +
geom_vline(data = m_stats,
aes(xintercept = value, linetype = Quantile),
colour = "skyblue4",
linewidth = 2) +
scale_linetype_manual(
values = c("2.5%" = "dotted",
"50%" = "dashed",
"97.5%" = "dotted")
) +
labs(
title = "Male",
x = expression(beta),
y = "Density"
)
# Female plot
female_plot <- ggplot(f_data, aes(beta)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.005,
fill = "seagreen",
alpha = 0.4) +
geom_line(data = f_curve,
aes(x = x, y = density),
colour = "seagreen4",
linewidth = 3) +
geom_vline(data = f_stats,
aes(xintercept = value, linetype = Quantile),
colour = "seagreen4",
linewidth = 2) +
scale_linetype_manual(
values = c("2.5%" = "dotted",
"50%" = "dashed",
"97.5%" = "dotted")
) +
labs(
title = "Female",
x = expression(beta),
y = "Density"
)
#%%%%%%%%%%%%%%%%%%%%%%% LINEAR MODELLING BETA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# linear modelling beta
# model incorporating all characteristics
lm_model <- lm(beta ~ weight + height + sex, data = data)
# Residual plots
par(mfrow = c(2, 2), mar = c(5, 5, 3, 2), oma = c(0, 0, 5, 0), cex = 2)
plot(lm_model, which = 1, caption = "", main = "Residuals vs Fitted",
sub.caption = "", cex.main = 1.5)
plot(lm_model, which = 2, caption = "", main = "Q-Q Residuals",
sub.caption = "", cex.main = 1.5)
plot(lm_model, which = 3, caption = "", main = "Scale Location",
sub.caption = "", cex.main = 1.5)
plot(lm_model, which = 5, caption = "", main = "Residuals vs Leverage",
sub.caption = "", cex.main = 1.5)
# Try modelling square root of beta (residual plots indicate
# a slight quadratic mean-variance relationship)
sqrt_lm_model <- lm(sqrt(beta) ~ weight + height + sex, data = data)
# Residual plots
par(mfrow = c(2, 2), mar = c(5, 5, 3, 2), oma = c(0, 0, 5, 0), cex = 2)
plot(sqrt_lm_model, which = 1, caption = "", main = "Residuals vs Fitted",
sub.caption = "", cex.main = 1.5)
plot(sqrt_lm_model, which = 2, caption = "", main = "Q-Q Residuals",
sub.caption = "", cex.main = 1.5)
plot(sqrt_lm_model, which = 3, caption = "", main = "Scale Location",
sub.caption = "", cex.main = 1.5)
plot(sqrt_lm_model, which = 5, caption = "", main = "Residuals vs Leverage",
sub.caption = "", cex.main = 1.5)
# quantile regression approach
# fit models
lm_model <- lm(beta ~ weight + height + sex, data = data)
rq_model <- rq(beta ~ weight + height + sex, data = data, tau = 0.025)
# predictions for linear model
pred_int <- predict(lm_model, newdata = data, interval = "prediction",
level = 0.95)
# create df for comparing predicted vs actual
pred_data <- data %>%
mutate(
lm_pred_mean = pred_int[, "fit"], # mean OLS prediction
lm_pred_lwr = pred_int[, "lwr"], # 2.5% prediction interval (lower)
lm_pred_upr = pred_int[, "upr"] # 97.5% prediction interval (upper)
)
# add empirical 2.5% quantile of residuals to find 2.5% quantile of linear model
resid_q025 <- quantile(residuals(lm_model), probs = 0.025)
pred_data <- pred_data %>%
mutate(lm_q025 = lm_pred_mean + resid_q025)
# quantile regression predictions
pred_data$rq_pred <- predict(rq_model, newdata = data)
# reshape
plot_data <- pred_data %>%
pivot_longer(
cols = c(lm_q025, rq_pred),
names_to = "model",
values_to = "predicted_beta"
)
# plot
quantile_v_lm_plot <- ggplot() +
# Points for models (mean LM, quantile LM, quantile regression)
geom_point(
data = plot_data,
aes(x = beta, y = predicted_beta, color = model),
alpha = 0.6, size = 2
) +
# Linear trend lines for clarity
geom_smooth(
data = plot_data,
aes(x = beta, y = predicted_beta, color = model),
method = lm, se = TRUE, linewidth = 1
) +
# Manual colors and labels
scale_color_manual(
values = c(
"lm_q025" = "gold",
"rq_pred" = "darkred"
),
labels = c(
"lm_q025" = "Linear Model 2.5%",
"rq_pred" = "Quantile Linear Model 2.5%"
)
) +
# Labels and theme
labs(
title = expression("Figure 12: Actual vs Predicted 2.5% Quantile of " * beta),
x = expression("Actual " * beta),
y = expression("Predicted " * beta),
color = "Model Type"
)
# %%%%%%%%%%%%%%%%%%%%%% APPLYING VARIOUS APPROACHES %%%%%%%%%%%%%%%%%%%%%%%%%
# setup from question
test_person <- data.frame(weight = 70, height = 160, age = 70, sex = "female")
Ct <- 0.15
t <- 2
# empirical quantiles for Beta60 (population approach)
beta_pop <- quantile(data$beta, probs = c(0.025, 0.5, 0.975))
# C0 values using population quantiles
C0_pop <- Ct + beta_pop * t
# gamma fitted whole population distribution
gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), gamma_shape, gamma_rate)
C0_gamma <- Ct + gamma_quantiles * t
# gamma fitted to males
m_gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), m_gamma_shape, m_gamma_rate)
C0_gamma_m <- Ct + m_gamma_quantiles * t
# gamma fitted to females
f_gamma_quantiles <- qgamma(c(0.025, 0.5, 0.975), f_gamma_shape, f_gamma_rate)
C0_gamma_f <- Ct + f_gamma_quantiles * t
# predicted Beta using regression model
lm_pred_vals <- predict(lm_model, newdata = test_person,
interval = "prediction", level = 0.95)
# C0 values from regression model
beta_pred_lm <- as.numeric(lm_pred_vals)
names(beta_pred_lm) <- c("Fitted", "Lower (PI 2.5%)", "Upper (PI 97.5%)")
C0_lm <- Ct + beta_pred_lm * t
# predicted beta using quantile regression model
rq_pred_vals <- predict(rq_model, newdata = test_person,
interval = "confidence", level = 0.95)
C0_rq <- Ct + rq_pred_vals * t
# combine results into one comparison table
results_table <- tibble(
Approach = c("Population (empirical quantiles)", "Gamma Fitted",
"Male Gamma Fitted", "Female Gamma Fitted", "Linear Regression",
"Quantile Regression"),
Lower_2.5 = c(C0_pop[1], C0_gamma[1], C0_gamma_m[1],
C0_gamma_f[1], C0_lm[2], C0_rq[2]),
Central = c(C0_pop[2], C0_gamma[2], C0_gamma_m[2],
C0_gamma_f[2], C0_lm[1], C0_rq[1]),
Upper_97.5 = c(C0_pop[3], C0_gamma[3], C0_gamma_m[3],
C0_gamma_f[3], C0_lm[3], C0_rq[3])
)
kable(results_table, digits = 3,
col.names = c("Approach", "Lower (2.5%)", "Central", "Upper (97.5%)"),
caption = "Table 4: Comparison of estimated C₀ values
for the various approaches")
# 1. Current approach (fixed beta)
beta_fixed <- 0.126
C0_fixed <- Ct + beta_fixed * t
p_fixed <- as.numeric(C0_fixed > x_limit)
# 2. Gamma model (use report MLEs)
gamma_shape <- 31.9538
gamma_rate <- 173.7079
beta_q025_gamma <- qgamma(0.025, shape = gamma_shape, rate = gamma_rate)
beta_q975_gamma <- qgamma(0.975, shape = gamma_shape, rate = gamma_rate)
C0_gamma_lower <- Ct + beta_q025_gamma * t
C0_gamma_upper <- Ct + beta_q975_gamma * t
C0_gamma_mean <- mean(c(C0_gamma_lower, C0_gamma_upper))
# Probability analytically from Gamma CDF
beta_threshold <- (x_limit - Ct) / t
p_gamma <- 1 - pgamma(beta_threshold, shape = gamma_shape, rate = gamma_rate)
# 3. Quantile regression approach (population-average bands)
# Fit lower/upper quantile regression for beta
rq_lower <- rq(beta ~ weight + age + sex, data = data, tau = 0.025)
rq_upper <- rq(beta ~ weight + age + sex, data = data, tau = 0.975)
# Predict β quantiles for each observation in the dataset
beta_q025_qr <- as.numeric(predict(rq_lower, newdata = data))
beta_q975_qr <- as.numeric(predict(rq_upper, newdata = data))
# Summarize Co interval by averaging β quantiles across the population
C0_qr_lower <- Ct + mean(beta_q025_qr, na.rm = TRUE) * t
C0_qr_upper <- Ct + mean(beta_q975_qr, na.rm = TRUE) * t
C0_qr_mean <- mean(c(C0_qr_lower, C0_qr_upper))
# Approximate P(Co > limit) under QR
set.seed(123)
n_sim <- 10000
# Sample indices and uniform draws between predicted bounds
idx <- sample(seq_along(beta_q025_qr), size = n_sim, replace = TRUE)
beta_samples_qr <- runif(n_sim, min = beta_q025_qr[idx], max = beta_q975_qr[idx])
C0_samples_qr <- Ct + beta_samples_qr * t
p_qr <- mean(C0_samples_qr > x_limit, na.rm = TRUE)
# comparison table
results_df <- tibble(
Method = c("Current (Fixed β = 0.126)", "Gamma model", "Quantile regression"),
Lower = c(C0_fixed, C0_gamma_lower, C0_qr_lower),
Upper = c(C0_fixed, C0_gamma_upper, C0_qr_upper),
Mean = c(C0_fixed, C0_gamma_mean, C0_qr_mean),
Prob = c(p_fixed, p_gamma, p_qr)
)
# Graphical comparison
p <- ggplot(results_df, aes(y = Method)) +
# Interval bands (thin lines)
geom_segment(aes(x = Lower, xend = Upper, y = Method, yend = Method,
colour = Method), linewidth = 2) +
# Endpoints for clarity
geom_point(aes(x = Lower, colour = Method), size = 3) +
geom_point(aes(x = Upper, colour = Method), size = 3) +
# Mean point
geom_point(aes(x = Mean, colour = Method), size = 10, shape = 18) +
# Legal limit line
geom_vline(xintercept = x_limit, linetype = "dashed",
colour = "darkgreen", linewidth = 1) +
annotate("text", x = x_limit + 0.01, y = 3.3,
label = "Legal limit = 0.47 g/kg",
colour = "darkgreen", hjust = 0, size = 9) +
annotate("rect", xmin = x_limit - 0.005, xmax = x_limit + 0.005,
ymin = 0.5, ymax = 3.5, alpha = 0.1, fill = "darkgreen") +
# Probability labels (skip Current approach)
geom_text(
data = results_df %>% filter(Method != first(Method)),
aes(x = Upper, y = Method,
label = sprintf("P(C[0] > 0.47) == %.3f", Prob)),
parse = TRUE,
hjust = 1,
vjust = 1.5,
size = 9,
colour = "black"
) +
scale_x_continuous(name = expression(C[0]~"(g/kg)"),
limits = c(0.35, 0.72),
expand = c(0, 0)) +
scale_colour_manual(values = c("Current (Fixed β=0.126)" = "grey60",
"Gamma model" = "firebrick",
"Quantile regression" = "hotpink"),
name = "Method") +
ggtitle(expression("Figure 13: Comparison of reporting frameworks for " * C[0]))
# Recommended Framework
# Create summary table
summary_table <- data.frame(
Quantity = c(
"Measured BAC (Ct)",
"Time since driving (t)",
"Legal limit",
"Estimated C₀ (central)",
"95% uncertainty interval",
"Probability C₀ exceeded legal limit"
),
Value = c(
"0.15 g/kg",
"2 hours",
"0.47 g/kg",
"0.43 g/kg",
"(0.34, 0.51) g/kg",
"74%"
)
)
kable(
summary_table,
caption = "Table 5: Standardised Summary Table (Quantile Regression Approach)",
col.names = c("Quantity", "Value"),
align = c("l", "c"),
format = "markdown"
)
# %%%%%%%%%%%%%%%%%%%%%%% INVESTIGATING V_d %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# define A, Co, Vd
data$A <- data$`Amount of Alcohol Consumed (g)`
data$Co <- data$`Co (g/Kg)`
data$Vd <- data$A/(data$Co *data$weight)
# view summary and quantiles of Vd
summary(data$Vd)
quantile(data$Vd, probs = c(0.025, 0.5, 0.975))
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Vd EDA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# define A, Co, Vd
data$A <- data$`Amount of Alcohol Consumed (g)`
data$Co <- data$`Co (g/Kg)`
data$Vd <- data$A/(data$Co *data$weight)
# view summary and quantiles of Vd
#summary(data$Vd)
#quantile(data$Vd, probs = c(0.025, 0.5, 0.975))
# histogram of Vd
density_plot <- ggplot(data, aes(x = Vd)) +
geom_histogram(aes(y = after_stat(density)),
binwidth = 0.02,
fill = "steelblue",
alpha = 0.55) +
geom_density(color = "navy",
size = 3) +
geom_vline(aes(xintercept=mean(Vd)),
linetype = "dashed",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(Vd, 0.025)),
linetype = "dotted",
color = "red",
size = 2) +
geom_vline(aes(xintercept=quantile(Vd, 0.975)),
linetype = "dotted",
color = "red",
size = 2) +
labs(x = "Vd (L/kg)", y = "Density",
title = expression("Figure 14: Distribution of Vd"))
# Vd vs weight
weight_plot2 <- ggplot(data, aes(x = weight, y = Vd, colour = sex)) +
geom_point(size = 3) +
geom_smooth(method = "lm", colour = "navy") +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
labs(title = "Figure 15: Vd vs Weight", x = "Weight (kg)", y = "Vd (L/kg)")
# Vd vs height
height_plot2 <- ggplot(data, aes(x = height, y = Vd, colour = sex)) +
geom_point(size = 3) +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
geom_smooth(method = "lm", colour = "navy") +
labs(title = "Figure 16: Vd vs Height", x = "Height (cm)", y = "Vd (L/kg)")
# Vd vs age
age_plot2 <- ggplot(data, aes(x = age, y = Vd, colour = sex)) +
geom_point(size = 3) +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
geom_smooth(method = "lm", colour = "navy") +
labs(title = "Figure 17: Vd vs Age", x = "Age (years)", y = "Vd (L/kg)")
# Vd vs gender
sex_plot2 <- ggplot(data, aes(x = sex, y = Vd,
fill = sex)) +
geom_violin(alpha = 0.8) +
geom_boxplot(width = 0.2, color = "navy", alpha = 0.7) +
scale_fill_manual(values = c("seagreen", "skyblue")) +
labs(
title = "Figure 18: Vd vs Gender",
x = "Gender",
y = "Vd (L/kg)"
) +
guides(fill = "none")
# Test correlation to investigate independence assumption
corr_test <- cor.test(data$beta, data$Vd, use = "complete.obs")
# Extract correlation coefficient and p-value
corr_coef <- corr_test$estimate
p_value <- corr_test$p.value
# View quantiles of each coefficient for comparison
beta_range <- quantile(data$beta, probs = c(0.025, 0.975))
Vd_range <- quantile(data$Vd, probs = c(0.025, 0.975))
# Scatter plot between beta and Vd
ggplot(data, aes(x = beta, y = Vd, colour = sex)) +
geom_point(size = 3) +
scale_colour_manual(values = c("male" = "skyblue", "female" = "seagreen")) +
geom_smooth(method = "lm", color = "navy") +
labs(
title = "Figure 19: Relationship between β and Vd",
subtitle = paste0(
"Correlation = ", round(corr_coef, 3),
", p-value = ", format.pval(p_value, digits = 5, eps = .00001)
),
x = "β elimination rate (g/kg/h)",
y = "Vd (L/kg)"
)
# Visualize the joint distribution
ggplot(data, aes(x = beta, y = Vd)) +
geom_point(alpha = 0.6, color = "orange", size = 3) +
geom_density_2d(color = "red", linewidth = 1.2) +
# Add the marginal 97.5th percentiles
geom_vline(xintercept = quantile(data$beta, 0.975, na.rm = TRUE),
linetype = "dashed", color = "blue", linewidth = 1.5) +
geom_hline(yintercept = quantile(data$Vd, 0.975, na.rm = TRUE),
linetype = "dashed", color = "blue", linewidth = 1.5) +
labs(
title = "Figure 20: Joint Distribution of β and Vd",
subtitle = "Blue lines show marginal 97.5th percentiles",
x = "β (g/kg/h)",
y = "V_d (L/kg)"
) +
annotate("text",
x = quantile(data$beta, 0.975, na.rm = TRUE),
y = min(data$Vd, na.rm = TRUE),
label = "97.5th percentile β", size = 7,
hjust = -0.1, color = "blue")
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ALT APPROACH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Test person
A <- mean(data$A)
weight <- 70
t <- 2
# Calculate Ct using empirical joint distribution
data$Ct_joint <- (A / (weight * data$Vd)) - data$beta * t
# Compute the quantiles of C_t
ct_quantiles <- quantile(data$Ct_joint, probs = c(0.025, 0.5, 0.975),
na.rm = TRUE)
# Current (independent) approach
beta_ind <- quantile(data$beta, 0.975, na.rm = TRUE)
Vd_ind <- quantile(data$Vd, 0.975, na.rm = TRUE)
Ct_independent <- (A / (weight * Vd_ind)) - beta_ind * t
# Table comparison
results_compare <- tibble(
Method = c("Empirical joint (β, Vd)", "Independent 97.5th percentiles"),
Lower_2.5 = c(round(quantile(data$Ct_joint, 0.025, na.rm = TRUE), 3), ""),
Median = c(round(quantile(data$Ct_joint, 0.5, na.rm = TRUE), 3), ""),
Upper_97.5 = c(round(quantile(data$Ct_joint, 0.975, na.rm = TRUE), 3),
round(Ct_independent, 3))
)
kable(results_compare, align = "lccc",
caption = "Table 6: Comparison of Empirical Joint vs Independent Ct")